最近的深度学习模型在言语增强方面已经达到了高性能。但是,获得快速和低复杂模型而没有明显的性能降解仍然是一项挑战。以前的知识蒸馏研究对言语增强无法解决这个问题,因为它们的输出蒸馏方法在某些方面不符合语音增强任务。在这项研究中,我们提出了基于特征的蒸馏多视图注意转移(MV-AT),以在时域中获得有效的语音增强模型。基于多视图功能提取模型,MV-AT将教师网络的多视图知识传输到学生网络,而无需其他参数。实验结果表明,所提出的方法始终提高瓦伦蒂尼和深噪声抑制(DNS)数据集的各种规模的学生模型的性能。与基线模型相比,使用我们提出的方法(一种用于有效部署的轻巧模型)分别使用了15.4倍和4.71倍(FLOPS),与具有相似性能的基线模型相比,Many-S-8.1GF分别达到了15.4倍和4.71倍。
translated by 谷歌翻译
最近,将变压器结构应用于图像分类任务的视觉变压器(VIV)具有优于卷积神经网络的优势。然而,使用诸如JFT-300M的大型数据集的预先训练的VIT结果的高性能和其对大型数据集的依赖性被解释为由于低地位感应偏差。本文提出了移动的贴片标记(SPT)和地区自我关注(LSA),有效解决了缺乏地区归纳偏差,使其即使在小型数据集上也能从划痕中学习。此外,SPT和LSA是通用且有效的附加模块,可轻松适用于各种VITS。实验结果表明,当SPT和LSA都应用于VITS时,性能在微小的想象中平均提高2.96%,这是一个代表性的小型数据集。特别是,由于所提出的SPT和LSA,Swin Transformer达到了4.08%的压倒性的性能提高。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/k-NN evaluation and transfer learning. Especially, with only 400 epochs, our method applied to ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation accuracy.
translated by 谷歌翻译
In this study, we propose a lung nodule detection scheme which fully incorporates the clinic workflow of radiologists. Particularly, we exploit Bi-Directional Maximum intensity projection (MIP) images of various thicknesses (i.e., 3, 5 and 10mm) along with a 3D patch of CT scan, consisting of 10 adjacent slices to feed into self-distillation-based Multi-Encoders Network (MEDS-Net). The proposed architecture first condenses 3D patch input to three channels by using a dense block which consists of dense units which effectively examine the nodule presence from 2D axial slices. This condensed information, along with the forward and backward MIP images, is fed to three different encoders to learn the most meaningful representation, which is forwarded into the decoded block at various levels. At the decoder block, we employ a self-distillation mechanism by connecting the distillation block, which contains five lung nodule detectors. It helps to expedite the convergence and improves the learning ability of the proposed architecture. Finally, the proposed scheme reduces the false positives by complementing the main detector with auxiliary detectors. The proposed scheme has been rigorously evaluated on 888 scans of LUNA16 dataset and obtained a CPM score of 93.6\%. The results demonstrate that incorporating of bi-direction MIP images enables MEDS-Net to effectively distinguish nodules from surroundings which help to achieve the sensitivity of 91.5% and 92.8% with false positives rate of 0.25 and 0.5 per scan, respectively.
translated by 谷歌翻译
在带有频划分双链体(FDD)的常规多用户多用户多输入多输出(MU-MIMO)系统中,尽管高度耦合,但已单独设计了通道采集和预编码器优化过程。本文研究了下行链路MU-MIMO系统的端到端设计,其中包括试点序列,有限的反馈和预编码。为了解决这个问题,我们提出了一个新颖的深度学习(DL)框架,该框架共同优化了用户的反馈信息生成和基础站(BS)的预编码器设计。 MU-MIMO系统中的每个过程都被智能设计的多个深神经网络(DNN)单元所取代。在BS上,神经网络生成试验序列,并帮助用户获得准确的频道状态信息。在每个用户中,频道反馈操作是由单个用户DNN以分布方式进行的。然后,另一个BS DNN从用户那里收集反馈信息,并确定MIMO预编码矩阵。提出了联合培训算法以端到端的方式优化所有DNN单元。此外,还提出了一种可以避免针对可扩展设计的不同网络大小进行重新训练的培训策略。数值结果证明了与经典优化技术和其他常规DNN方案相比,提出的DL框架的有效性。
translated by 谷歌翻译
由于处理非covex公式的能力,深入研究深度学习(DL)技术以优化多用户多输入单输出(MU-MISO)下行链接系统。但是,现有的深神经网络(DNN)的固定计算结构在系统大小(即天线或用户的数量)方面缺乏灵活性。本文开发了一个双方图神经网络(BGNN)框架,这是一种可扩展的DL溶液,旨在多端纳纳波束形成优化。首先,MU-MISO系统以两分图为特征,其中两个不相交的顶点集(由传输天线和用户组成)通过成对边缘连接。这些顶点互连状态是通过通道褪色系数建模的。因此,将通用的光束优化过程解释为重量双分图上的计算任务。这种方法将波束成型的优化过程分为多个用于单个天线顶点和用户顶点的子操作。分离的顶点操作导致可扩展的光束成型计算,这些计算不变到系统大小。顶点操作是由一组DNN模块实现的,这些DNN模块共同构成了BGNN体系结构。在所有天线和用户中都重复使用相同的DNN,以使所得的学习结构变得灵活地适合网络大小。 BGNN的组件DNN在许多具有随机变化的网络尺寸的MU-MISO配置上进行了训练。结果,训练有素的BGNN可以普遍应用于任意的MU-MISO系统。数值结果验证了BGNN框架比常规方法的优势。
translated by 谷歌翻译
在本文中,我们提出了Sanane-TTS,这是一种稳定且自然的端到端多语言TTS模型。由于很难为给定的演讲者获得多语言语料库,因此不可避免地会使用单语语料库进行多语言TTS模型。我们介绍了扬声器正规化损失,该损失可改善跨语性合成期间的语音自然性以及域对抗训练,该训练适用于其他多语言TTS模型。此外,通过添加扬声器正规化损失,以持续时间为零矢量嵌入的扬声器可以稳定跨语性推断。通过此替代品,我们的模型将产生以中等节奏的语音,而不论跨语性合成中的源说话者如何。在MOS评估中,Sane-TTS在跨语义和内部合成中的自然性得分高于3.80,地面真相评分为3.99。同样,即使在跨语性的推论中,Sane-TTS也保持了接近地面真理的说话者相似性。音频样本可在我们的网页上找到。
translated by 谷歌翻译
来自数据流的在线异常检测对于许多应用程序的安全性至关重要,但是由于来自IoT设备和基于云的基础架构的复杂且不断发展的数据流而面临严重的挑战。不幸的是,现有方法对这些挑战太短。在线异常检测方法承担着处理复杂性的负担,而离线深度异常检测方法则遭受了不断发展的数据分布的影响。本文介绍了一个在线深度异常检测的框架ARCU,可以与任何基于自动编码器的深度异常检测方法实例化。它使用两种新颖的技术使用自适应模型合并方法来处理复杂而不断发展的数据流:概念驱动的推理和漂移感知模型池更新;前者检测到最适合复杂性的模型组合的异常,后者会动态调整模型池以适合不断发展的数据流。在具有高维和概念拖延的十个数据集的全面实验中,Arcus提高了基于最先进的自动编码器的流媒体变体的异常检测准确性,并提高了最新的方法和最新的方法。 ART流动异常检测方法的分别为22%和37%。
translated by 谷歌翻译
现代消费电子设备已为其主要功能采用了深度学习的情报服务。供应商最近开始在设备上执行情报服务,以在设备中保存个人数据,降低网络和云成本。我们发现了通过使用用户数据更新神经网络的情况,而无需将数据暴露在设备中:设备培训。例如,我们可能会添加一个新课程,我的狗Alpha用于机器人真空吸尘器,适应用户口音的语音识别,让文本到语音说话,好像用户会说话。但是,目标设备的资源限制遇到了重大困难。我们建议NNTrainer,这是一个轻巧的设备培训框架。我们描述了NNTrainer实施的神经网络的优化技术,这些技术与传统一起评估。评估表明,NNTrainer可以将内存消耗降低至1/28,而不会恶化准确性或训练时间,并有效地个性化了对设备上的应用程序。 NNTrainer是跨平台和实用的开源软件,该软件正在作者隶属关系中部署到数百万个设备。
translated by 谷歌翻译